#delete duplicate rows in python | Explore Tumblr posts and blogs

unpopularly-opinionated · 4 months ago

Text

This is a bit of a brag post, but in my quest to archive digital media to keep for myself, I’ve successfully cobbled together a program that automatically goes through Marvel Comics on their website and screenshots individual pages before saving them in a folder and moves onto the next one. It took me all night but fuck it, it works now. Still has one minor kink to work out but I’m gonna save that for tomorrow.

But yeah, they made it rather difficult for me because I thought it was going to be as simple as saving the images as they load into my browser, but instead of actually loading the images, Marvel has created this little applet in JS so the images aren’t so much being loaded directly into my browser as they are the applet. I’m not sure if they meant it as anti-piracy, or if it’s just a coincidental bonus for them since I think the purpose of the applet is for their whole “guided reading” feature which goes through each page panel-by-panel to help you know what direction to read everything in I guess. Anyways, this would’ve been easy as shit for me because JS is probably the language I’m most comfortable with, but oh well, fuck me I guess. Pirating can’t be easy, now can it.

So alternatively I had to resort to screenshotting a specific portion of my screen and saving them as PNGs myself. This meant I had to use Python which… I know how to use, but fuck me if it isn’t always finicky to use. Dozens of pip errors to deal with because no matter how often I download Python, I apparently do it wrong every time. Anywho, their whole system is in JS, and it hangs a fair bit and can be slow as shit. Not to mention, there’s this obnoxious page turning animation that has to play out which IMO they should have a button to disable that seeing as it can cause the program to lag which wouldn’t be great for slower systems, but I digress. So I needed to throw in a bunch of sleep timers to keep the program running smoothly.

The hard part was actually having to interface with the browser in order to both copy the title of the comic book so I could name my folders after them, as well as clicking the button to load up the next comic. Seeing as how this website requires a login, loading it up on a proxy browser was kind of difficult, plus in all my years of attempting to fuck with proxy browsers I have NEVER gotten Selenium to work no matter what I do so that’s fun. So anyways, I said fuck that, and found a workaround using Chrome and their debug mode port. So now I have basically a debug-version of Chrome open on my screen which is able to be interfaced with via code. Pretty neat stuff that I learned today.

I would say downsides that I have to work through:

1. Even though I technically have it set to stop the program when I press ESC, I actually set it up wrong for the loop so I can’t easily end the loop once I’ve started it. Gotta kill the terminal for that. Oops.

2. The minor kink I mentioned is very rarely I encounter a two-page spread which isn’t accounted for since it’s programmed to screenshot a very specific section of my screen and that’s it. Unfortunately, in order to fix that, I’m likely going to have to deal with shit like pixel detection which I dabbled with eons ago but didn’t quite manage to get to do what I wanted it to do last time. Hopefully this time is different.

3. It’s also set with a hard number of 25 screenshots per comic because I found on average (at least for the series I’m currently running it on) there were about 21-22 pages per issue. After that, it screenshots the end page several times in a row, but I have it set to determine based on hash values if there are duplicate photos in the folder and then delete those so that part is fine. It’s just that if I encounter a comic with more than 25 pages in it, the program is FUCKED as it will mess up all future iterations of the program. I mean on the bright side, ideally it’ll kill the program which conveniently solves problem #1 lmao.

#random #programming #comic books #Marvel

2 notes · View notes

fromdevcom · 5 months ago

Text

Pandas DataFrame Cleanup: Master the Art of Dropping Columns Data cleaning and preprocessing are crucial steps in any data analysis project. When working with pandas DataFrames in Python, you'll often encounter situations where you need to remove unnecessary columns to streamline your dataset. In this comprehensive guide, we'll explore various methods to drop columns in pandas, complete with practical examples and best practices. Understanding the Basics of Column Dropping Before diving into the methods, let's understand why we might need to drop columns: Remove irrelevant features that don't contribute to analysis Eliminate duplicate or redundant information Clean up data before model training Reduce memory usage for large datasets Method 1: Using drop() - The Most Common Approach The drop() method is the most straightforward way to remove columns from a DataFrame. Here's how to use it: pythonCopyimport pandas as pd # Create a sample DataFrame df = pd.DataFrame( 'name': ['John', 'Alice', 'Bob'], 'age': [25, 30, 35], 'city': ['New York', 'London', 'Paris'], 'temp_col': [1, 2, 3] ) # Drop a single column df = df.drop('temp_col', axis=1) # Drop multiple columns df = df.drop(['city', 'age'], axis=1) The axis=1 parameter indicates we're dropping columns (not rows). Remember that drop() returns a new DataFrame by default, so we need to reassign it or use inplace=True. Method 2: Using del Statement - The Quick Solution For quick, permanent column removal, you can use Python's del statement: pythonCopy# Delete a column using del del df['temp_col'] Note that this method modifies the DataFrame directly and cannot be undone. Use it with caution! Method 3: Drop Columns Using pop() - Remove and Return The pop() method removes a column and returns it, which can be useful when you want to store the removed column: pythonCopy# Remove and store a column removed_column = df.pop('temp_col') Advanced Column Dropping Techniques Dropping Multiple Columns with Pattern Matching Sometimes you need to drop columns based on patterns in their names: pythonCopy# Drop columns that start with 'temp_' df = df.drop(columns=df.filter(regex='^temp_').columns) # Drop columns that contain certain text df = df.drop(columns=df.filter(like='unused').columns) Conditional Column Dropping You might want to drop columns based on certain conditions: pythonCopy# Drop columns with more than 50% missing values threshold = len(df) * 0.5 df = df.dropna(axis=1, thresh=threshold) # Drop columns of specific data types df = df.select_dtypes(exclude=['object']) Best Practices for Dropping Columns Make a Copy First pythonCopydf_clean = df.copy() df_clean = df_clean.drop('column_name', axis=1) Use Column Lists for Multiple Drops pythonCopycolumns_to_drop = ['col1', 'col2', 'col3'] df = df.drop(columns=columns_to_drop) Error Handling pythonCopytry: df = df.drop('non_existent_column', axis=1) except KeyError: print("Column not found in DataFrame") Performance Considerations When working with large datasets, consider these performance tips: Use inplace=True to avoid creating copies: pythonCopydf.drop('column_name', axis=1, inplace=True) Drop multiple columns at once rather than one by one: pythonCopy# More efficient df.drop(['col1', 'col2', 'col3'], axis=1, inplace=True) # Less efficient df.drop('col1', axis=1, inplace=True) df.drop('col2', axis=1, inplace=True) df.drop('col3', axis=1, inplace=True) Common Pitfalls and Solutions Dropping Non-existent Columns pythonCopy# Use errors='ignore' to skip non-existent columns df = df.drop('missing_column', axis=1, errors='ignore') Chain Operations Safely pythonCopy# Use method chaining carefully df = (df.drop('col1', axis=1) .drop('col2', axis=1) .reset_index(drop=True)) Real-World Applications Let's look at a practical example of cleaning a dataset: pythonCopy# Load a messy dataset df = pd.read_csv('raw_data.csv')

# Clean up the DataFrame df_clean = (df.drop(columns=['unnamed_column', 'duplicate_info']) # Remove unnecessary columns .drop(columns=df.filter(regex='^temp_').columns) # Remove temporary columns .drop(columns=df.columns[df.isna().sum() > len(df)*0.5]) # Remove columns with >50% missing values ) Integration with Data Science Workflows When preparing data for machine learning: pythonCopy# Drop target variable from features X = df.drop('target_variable', axis=1) y = df['target_variable'] # Drop non-numeric columns for certain algorithms X = X.select_dtypes(include=['float64', 'int64']) Conclusion Mastering column dropping in pandas is essential for effective data preprocessing. Whether you're using the simple drop() method or implementing more complex pattern-based dropping, understanding these techniques will make your data cleaning process more efficient and reliable. Remember to always consider your specific use case when choosing a method, and don't forget to make backups of important data before making permanent changes to your DataFrame. Now you're equipped with all the knowledge needed to effectively manage columns in your pandas DataFrames. Happy data cleaning!

0 notes

data-science-lovers · 3 years ago

Text

Dealing with Duplicate Rows in Big-Data

#Shiva #how to remove duplicate rows #pandas duplicate records delete #how to deal with duplicate records #how to drop duplicate rows in data #how to check duplicate rows records in data #pandas duplicate rows removal #pandas duplicate rows deletion #5:37 NOW PLAYING Watch later Add to queue How to Remove Duplicate Rows in Pandas Dataframe? |#How do I find and remove duplicate rows in pandas?#drop duplicates #drop_duplicates()#data science #delete duplicate rows in python #python

0 notes

coolwizardprince · 3 years ago

Text

Using pg chameleon to Migrate Data from MySQL to openGauss

Introduction to pg_chameleon

pg_chameleon is a real-time replication tool compiled in Python 3 for migrating data from MySQL to PostgreSQL. The tool uses the mysql-replication library to extract row images from MySQL. The row images are stored in PostgreSQL in JSONB format.

A pl/pgsql function in PostgreSQL is executed to decode row images in JSONB format and replay the changes to PostgreSQL. In addition, the tool uses the read-only mode to pull full data from MySQL to PostgreSQL through initial configuration. In this way, the tool provides the function of copying the initial full data and subsequent incremental data online in real time.

pg_chameleon has the following features:

Provides online real-time replication by reading the MySQL BinLog.

Supports reading data from multiple MySQL schemas and restoring the data to the target PostgreSQL database. The source schemas and target schemas can use different names.

Implements real-time replication through a daemon. The daemon consists of two subprocesses. One is responsible for reading logs from MySQL, and the other is responsible for replaying changes to PostgreSQL.

openGauss is compatible with PostgreSQL communication protocols and most syntaxes. For this reason, you can use pg_chameleon to migrate data from MySQL to openGauss. In addition, the real-time replication capabilities of pg_chameleon greatly reduce the service interruption duration during database switchover.

pg_chameleon Issues in openGauss

pg_chameleon depends on the psycopg2 driver, and the psycopg2 driver uses the pg_config tool to check the PostgreSQL version and restricts PostgreSQL of earlier versions from using this driver. The pg_config tool of openGauss returns the version of openGauss (the current version is openGauss 2.0.0). As a result, the driver reports a version error “ Psycopg requires PostgreSQL client library (libpq) >= 9.1”. You need to use psycopg2 through source code compilation and remove related restrictions in the source header file psycopg/psycopg.h.

pg_chameleon sets the GUC parameter LOCK_TIMEOUT to limit the timeout for waiting for locks in PostgreSQL. openGauss does not support this parameter. (openGauss supports the GUC parameter lockwait_timeout, which needs to be set by the administrator.) You need to delete related settings from the source code of pg_chameleon.

pg_chameleon uses the syntax of the UPSERT statement to specify the replacement operation when a constraint is violated. The function and syntax of the UPSERT statement supported by openGauss is different from those supported by PostgreSQL. openGauss uses the ON DUPLICATE KEY UPDATE { column_name = { expression | DEFAULT } } [, …] syntax, while PostgreSQL uses the ON CONFLICT [ conflict_target ] DO UPDATE SET { column_name = { expression | DEFAULT } } syntax. Therefore, these two databases differ slightly in functions and syntaxes. You need to modify the related UPSERT statement in the source code of pg_chameleon.

pg_chameleon uses the CREATE SCHEMA IF NOT EXISTS and CREATE INDEX IF NOT EXISTS syntaxes. openGauss does not support the IF NOT EXISTS option of schemas and indexes. You need to modify the logic so that the system checks whether the schemas and indexes exist before creating them.

To select the array range, openGauss runs column_name[start, end], while PostgreSQL runs column_name[start:end]. You need to modify the array range selection mode in the source code of pg_chameleon.

pg_chameleon uses the INHERITS function, but openGauss does not support inherited tables. You need to modify the SQL statements and tables that use inherited tables.

Next, use pg_chameleon to migrate data from MySQL to openGauss.

Configuring pg_chameleon

pg_chameleon uses the config-example.yaml configuration file in ~/.pg_chameleon/configuration to define configurations during migration. The configuration file consists of four parts: global settings, type_override, postgres destination connection, and sources. global settings is used to set the log file path, log level, and others. type_override allows users to customize type conversion rules and overwrite existing default conversion rules. postgres destination connection is used to configure the parameters for connecting to openGauss. sources is used to define the parameters for connecting to MySQL and other configurable items during replication.

For more details about the configuration items, see the official website:

https://pgchameleon.org/documents_v2/configuration_file.html

The following is an example of the configuration file:# global settings pid_dir: '~/.pg_chameleon/pid/' log_dir: '~/.pg_chameleon/logs/' log_dest: file log_level: info log_days_keep: 10 rollbar_key: '' rollbar_env: '' # type_override allows the user to override the default type conversion # into a different one. type_override: "tinyint(1)": override_to: boolean override_tables: - "*" # postgres destination connection pg_conn: host: "1.1.1.1" port: "5432" user: "opengauss_test" password: "password_123" database: "opengauss_database" charset: "utf8" sources: mysql: db_conn: host: "1.1.1.1" port: "3306" user: "mysql_test" password: "password123" charset: 'utf8' connect_timeout: 10 schema_mappings: mysql_database:sch_mysql_database limit_tables: skip_tables: grant_select_to: - usr_migration lock_timeout: "120s" my_server_id: 1 replica_batch_size: 10000 replay_max_rows: 10000 batch_retention: '1 day' copy_max_memory: "300M" copy_mode: 'file' out_dir: /tmp sleep_loop: 1 on_error_replay: continue on_error_read: continue auto_maintenance: "disabled" gtid_enable: false type: mysql keep_existing_schema: No

The preceding configuration file indicates that the username and password for connecting to MySQL are mysql_test and password123 respectively during data migration. The IP address and port number of the MySQL server are 1.1.1.1 and 3306, respectively. The source database is mysql_database.

The username and password for connecting to openGauss are opengauss_test and password_123, respectively. The IP address and port number of the openGauss server are 1.1.1.1 and 5432, respectively. The target database is opengauss_database. The sch_mysql_database schema is created in opengauss_database, and all tables to be migrated are in this schema.

Note that the user must have the permission to remotely connect to MySQL and openGauss as well as the read and write permissions on the corresponding databases. For openGauss, the host where pg_chameleon runs must be in the remote access whitelist of openGauss. For MySQL, the user must have the RELOAD, REPLICATION CLIENT, and REPLICATION SLAVE permissions.

The following describes the migration procedure.

Creating Users and Databases

The following shows how to create the users and databases in openGauss required for migration.

The following shows how to create the users in MySQL required for migration and grant related permissions to the users.

Enabling the Replication Function of MySQL

Modify the MySQL configuration file. Generally, the configuration file is /etc/my.cnf or the cnf configuration file in the /etc/my.cnf.d/ folder. Modify the following configurations in the [mysqld] configuration block (if the [mysqld] configuration block does not exist, add it):[mysqld] binlog_format= ROW log_bin = mysql-bin server_id = 1 binlog_row_image=FULL expire_logs_days = 10

After the modification, restart MySQL for the configurations to take effect.

Runing pg_chameleon to Migrate Data

Create and activate a virtual Python environment.

python3 -m venv venv

source venv/bin/activate

Download and install psycopg2 and pg_chameleon.

Run the pip install pip –upgrade command to upgrade pip.

Add the folder where the pg_config tool of openGauss is located to the $PATH environment variable. Example:

export PATH={openGauss-server}/dest/bin:$PATH

Download the source code of psycopg2 at https://github.com/psycopg/psycopg2, remove the restriction of checking the PostgreSQL version, and run the python setup.py install command to compile the source code and install the tool.

Download the source code of pg_chameleon at https://github.com/the4thdoctor/pg_chameleon, solve the preceding issues in openGauss, and run the python setup.py install command to compile the source code and install the tool.

Create the configuration file directory of pg_chameleon.

chameleon set_configuration_files

Modify the configuration file of pg_chameleon.

cd ~/.pg_chameleon/configuration

cp config-example.yml default.yml

Modify the default.yml file as required. Modify the connection configuration information, user information, database information, and schema mapping specified by pg_conn and mysql. An example of the configuration file is provided for reference.

Initialize the replication stream.

chameleon create_replica_schema –config default

chameleon add_source –config default –source mysql

In this step, an auxiliary schema and table are created for the replication process in openGauss.

Copy basic data.

chameleon init_replica –config default –source mysql

After this step is complete, the current full data in MySQL is copied to openGauss.

You can view the replication result in openGauss.

Enable online real-time replication.

chameleon start_replica –config default –source mysql

After real-time replication is enabled, insert a data record into MySQL.

View the data in the test_decimal table in openGauss.

The newly inserted data record is successfully copied to openGauss.

Disable online replication.

chameleon stop_replica –config default –source mysql

chameleon detach_replica –config default –source mysql

chameleon drop_replica_schema –config default

#opensource #opengauss #developers & startups #database

0 notes

hydrus · 7 years ago

Text

Version 330

youtube

windows

zip

exe

os x

app

tar.gz

linux

tar.gz

source

tar.gz

I had a great week. There are some more login scripts and a bit of cleanup and speed-up.

The poll for what big thing I will work on next is up! Here are the poll + discussion thread:

https://www.poll-maker.com/poll2148452x73e94E02-60

https://8ch.net/hydrus/res/10654.html

The new 'manage logins' dialog is easier to work with. It now shows when it thinks a login will expire, permits you to enter 'empty' credentials if you want to reset/clear a domain, and has a 'scrub invalid' button to reset a login that fails due to server error or similar.

After tweaking for the problem I discovered last week, I was able to write a login script for hentai foundry that uses username and pass. It should inherit the filter settings in your user profile, so you can now easily exclude the things you don't like! (the click-through login, which hydrus has been doing for ages, sets the filters to allow everything every time it works) Just go into manage logins, change the login script for www.hentai-foundry.com to the new login script, and put in some (throwaway) credentials, and you should be good to go.

I am also rolling out login scripts for shimmie, sankaku, and e-hentai, thanks to Cuddlebear (and possibly other users) on the github (which, reminder, is here: https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/tree/master/Download%20System ).

Pixiv seem to be changing some of their login rules, as many NSFW images now work for a logged-out hydrus client. The pixiv parser handles 'you need to be logged in' failures more gracefully, but I am not sure if that even happens any more! In any case, if you discover some class of pixiv URLs are giving you 'ignored' results because you are not logged in, please let me know the details.

Also, the Deviant Art parser can now fetch a sometimes-there larger version of images and only pulls from the download button (which is the 'true' best, when it is available) if it looks like an image. It should no longer download 140MB zips of brushes!

other stuff

Some kinds of tag searches (usually those on clients with large inboxes) should now be much faster!

Repository processing should also be faster, although I am interested in how it goes for different users. If you are on an HDD or have otherwise seen slow tag rows/s, please let me know if you notice a difference this week, for better or worse. The new system essentially opens the 'new tags m8' firehose pretty wide, but if that pressure is a problem for some people, I'll give it a more adaptable nozzle.

Many of the various 'select from a list of texts' dialogs across the program will now size themselves bigger if they can. This means, for example, that the gallery selector should now show everything in one go! The manage import/export folder dialogs are also moved to the new panel system, so if you have had trouble with these and a small screen, let me know how it looks for you now.

The duplicate filter page now has a button to edit your various duplicate merge options. The small button on the viewer was too-easily missed, so this should make it a bit easier!

full list

added a proper username/password login script for hentai foundry--double-check your hf filters are set how you want in your profile, and your hydrus should inherit the same rules

fixed the gelbooru login script from last week, which typoed safebooru.com instead of .org

fixed the pixiv login 'link' to correctly say nsfw rather than everything, which wasn't going through last week right

improved the pixiv file page api parser to veto on 'could not access nsfw due to not logged in' status, although in further testing, this state seems to be rarer than previously/completely gone

added login scripts from the github for shimmie, sankaku, and e-hentai--thanks to Cuddlebear and any other users who helped put these together

added safebooru.donmai.us to danbooru login

improved the deviant art file page parser to get the 'full' embedded image link at higher preference than the standard embed, and only get the 'download' button if it looks like an image (hence, deviant art should stop getting 140MB brush zips!)

the manage logins panel now says when a login is expected to expire

the manage logins dialog now has a 'scrub invalidity' button to 'try again' a login that broke due to server error or similar

entering blank/invalid credentials is now permitted in the manage logins panel, and if entered on an 'active' domain, it will additionally deactivate it automatically

the manage logins panel is better at figuring out and updating validity after changes

the 'required cookies' in login scripts and steps now use string match names! hence, dynamically named cookies can now be checked! all existing checks are updated to fixed-string string matches

improved some cookie lookup code

improved some login manager script-updating code

deleted all the old legacy login code

misc login ui cleanup and fixes

other:

sped up tag searches in certain situations (usually huge inbox) by using a different optimisation

increased the repository mappings processing chunk size from 1k to 50k, which greatly increases processing in certain situations. let's see how it goes for different users--I may revisit the pipeline here to make it more flexible for faster and slower hard drives

many of the 'select from a list of texts' dialogs--such as when you select a gallery to download from--are now on the new panel system. the list will grow and shrink depending on its length and available screen real estate

misc:

extended my new dialog panel code so it can ask a question before an OK happens

fixed an issue with scanning through videos that have non-integer frame-counts due to previous misparsing

fixed a issue where file import objects that have been removed from the list but were still lingering on the list ui were not rendering their (invalid) index correctly

when export folders fail to do their work, the error is now presented in a better way and all export folders are paused

fixed an issue where the export files dialog could not boot if the most previous export phrase was invalid

the duplicate filter page now has a button to more easily edit the default merge options

increased the sibling/parent refresh delay for 1s to 8s

hydrus repository sync fails due to network login issues or manual network user cancel will now be caught properly and a reasonable delay added

additional errors on repository sync will cause a reasonable delay on future work but still elevate the error

converted import folder management ui to the new panel system

refactored import folder ui code to ClientGUIImport.py

converted export folder management ui to the new panel system

refactored export folder ui code to the new ClientGUIExport.py

refactored manual file export ui code to ClientGUIExport.py

deleted some very old imageboard dumping management code

deleted some very old contact management code

did a little prep work for some 'show background image behind thumbs', including the start of a bitmap manager. I'll give it another go later

next week

I have about eight jobs left on the login manager, which is mostly a manual 'do login now' button on manage logins and some help on how to use and make in the system. I feel good about it overall and am thankful it didn't explode completely. Beyond finishing this off, I plan to continue doing small work like ui improvement and cleanup until the 12th December, when I will take about four weeks off over the holiday to update to python 3. In the new year, I will begin work on what gets voted on in the poll.

#release

2 notes · View notes

margdarsanme · 5 years ago

Text

NCERT Class 12 Computer Science Chapter 4 Database Concepts

NCERT Class 12 Computer Science Python Solutions for Chapter 4 :: Database Concepts

Short Answer Type Questions-I

Question 1:Observe the following PARTICIPANTS and EVENTS table cerefully and write the name of the RDBMS operation which will be used to produce the output as shown in RESULT? Also, find the Degree and Cardinality of the RESULT.

Answer:Cartesian ProductDegree — 4Cardinality = 6

Question 2:Define degree and cardinality. Also, Based upon given table write degree and cardinality.Answer:Degree is the number of attributes or columns present in a table.Cardinality is the number of tuples or rows present in a table.Patients Degree = 4Cardinality = 5

Question 3:Observe the following table and answer the parts (i) and (ii):

In the above table, can we have Qty as primary key.

What is the cardinality and degree of the above table?

Answer:

We cannot use Qty as primary key because there is a duplication of values and primary key value cannot be duplicate.

Degree =4Cardinality = 5

Question 4:Explain the concept of union between two tables, with the help of appropriate example.Answer:The union operation denoted by ‘U’ combines two or more relations. The resultant of union operation contain tuples that are in either of the table or in both tables.

Question 5:Observe the following STUDENTS and EVENTS tables carefully and write the name of the RDBMS operation which will be used to produce the output as shown in LIST table? Also, find the degree and cardinality of the table.Answer:Cartesian ProductDegree = 4Cardinality = 6

Question 6:Observe the following MEMBER and ACTIVITY tables carefully and write the name of the RDBMS operation, which will be used to produce the output as shown in REPORT? Also, find the Degree and Cardinality of the REPORT.

Answer:Join operation or MEMBER U ACTIVITYDegree of Report = No of columns(No of Attributes) = 3Candinality Report = No of Rows(No of tuples) = 6

Question 7:Observe the table ‘Club’ given below:

What is the cardinality and degree of the given table?

If a new column Contact_No has been added and three more members have joined the club then

Answer:

Cardinality = 4 Degree = 5

Cardinality = 7

Degree = 6

Question 8:What do you understand by Union & Cartesian product in the relational algebra?Answer:Union of R ans S :The Union of two relations is a relation that includes all the tuples that are either in R or in S or in both R and S. Duplicate tuples are eliminated.The Union is an operator which works on two how sets. It combines the tuples of one relation with all the tuples of the other relation such that there is no duplication.

Cartesian Product: The cartesian product is an operator which works on two sets. It combines the tuples of one relation with all the tuples of the other relation.

Example: Cartesian Product

Question 9:Differentiate between the Alternate key of a table with the help of an example.Answer:Primary Key: A primary key is a value that can be used to identify a unique row in a table .Alternate Key: An alternate key is any candidate key which is not selected to be the primary keyExample:

So, (Bank Account Number, Aadhaar Number) are candidate keys for the table.Aadhaar Number — Primary keyBank Account Number — Alternate key

Question 10:Explain the concept of candidate key with the help of an appropriate example.Answer:Candidate key is a column or set of columns that can help in identifying records uniquely.Example, consider a Table STUDENT.Here, AdmnNo & Roll No define Table uniquely.Hence, they are candiadate keys

Question 11:What do you understand by degree & cardinality of a Table ?Answer:Degree refers to the number of columns in a table.Cardinality refers to the number of rows.

Question 12:Observe the following table and answer the part (i) and (ii) accordingly.

In the above table, can we take Mno as Primary key ? (Answer as [Yes/No] only.) Justify your answer with a valid reason.

What is the degree and the cardinality of the above table?

Answer:

Degree = 4Cardinality = 5[Hint: Because Pencil and Eraser are having the same Mno = 2. Primary key needs to be unique]

Question 13:Give a suitable example of a table with sample data and illustrate Primary and Candidate keys in it.Answer:A table may have more than one such attribute or a group of attribute that identifies a row/ tuple uniquely, all such attribute(s) are known as Candidate keys. Out of the Candidate keys, one is selected as Primary key.Id = Primary key Id and Qty = Candidates Keys

Question 14:What do you understand by selection and projection operations in the relational algebra?Answer:Projection (n): In relational algebra, projection is a unary operation. The result of such projection is defined as the set obtained when the components of the tuple R are restricted to the set {a1…,an} – it discards (or exculdes) the other attributes.Selection (): In relational algebra, a selection is a unary operation written as (R) or (R) where:

a and b are attribute names

i is a binary operation in the set

v is a value constant

R is a relation

The selection (R) selects all those tuples in R for which i holds between the a atribute and the b attribute.

Example: Selection and Projection

Question 15:What do you understand by Primary key and Candidate keys.Answer:An attribute or set of attributes which are used to identify a tuple uniquely is known as primary key. If a tuple has more than one such attribute which identify a tuple uniquely, than all such attributes are known as candidate keys.

Question 16:What is relation? Define the relational data model.Answer:A relation is a table having atomic values, unique row, and unordered rows and columns. The relational model represent data and relationship among data by a collection of tables known as relation, each of which has a number of columns with unique names.

Question 17:Define domain with respect to database. Give an example.Answer:A domain is a pool of values from which the actual values appearing in a given column are drawn.For example: The values appearing in the Supp# column of both the suppliers table and the Shipment table are drawn from the same domain.

Question 18:Expand the following:

SQL

DBMS

Answer:

SQL – Structured Query Language.

DBMS – Data Base Management System.

Question 19:What do you understand by candidate keys in a table? Give a suitable example of candidate keys from a table containing some meaningful data.Answer:Candidate key: A candidate key is one that can identify each row of a table uniquely. Generally, a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.Example:Question 20:What are all the domain names possible in gender ?Answer:Male and Female

Question 21:A table ‘customer’ has 10 columns but no row. Later, 10 new rows are inserted and 3 rows are deleted in the table. What is the degree and cardinality of the table customer.Answer:Degree = 10 [no. of cols]Cardinality = 10-3 = 7 [no. of rows]

Question 22:A table ‘student’ has 3 columns and 10 rows and another table ‘student 2’ has the same columns as student but 15 rows. 5 rows are common in both the tables. If we take union, what is the degree and cardinality of the resultant table ?Answer:Degree = 3Cardinality = 30 (20 + 15 – 5)

Question 23:A table ‘student’ has 4 columns and 10 rows and ‘student 2’ has 5 columns and 5 rows. If we take cartesian product of these two tables, what is the degree and cardinality of the resultant table ?Answer: Degree = 4 x 5 = 20 [no. of columns]Cardinality = 10 x 5 = 50 [no. of rows]

Question 24:In the following 2 tables, find the union value of Student 1 and Student 2.Answer:

via Blogger https://ift.tt/2RrnZJs

#Blogger

0 notes

globalmediacampaign · 5 years ago

Text

How realtor.com® maximized data upload from Amazon S3 into Amazon DynamoDB

This is a customer post by Arup Ray, VP Data Technology at realtor.com ®, and Daniel Whitehead, AWS Solutions Architect. Arup Ray would like to acknowledge Anil Pillai, Software Development Engineer at Amazon, for his pioneering contributions to this project during his former tenure at realtor.com ® as Senior Principal Data Engineer. realtor.com ®, operated by Move, Inc., is in their own words, “a trusted resource for home buyers, sellers, and dreamers. It offers a robust database of for-sale properties across the U.S. and the information, tools, and professional expertise to help people move confidently through every step of their home journey.” At realtor.com®, data and analytics are an important part of making the process of buying a home easier and more rewarding. As our customers search for properties, we identify the attributes of most interest to our customers and use that data to generate more tailored recommendations for similar houses within the area, to help our customers find their ideal new home. Personalized home suggestions are of critical importance to finding a customer’s dream home. This is why realtor.com® utilizes Amazon DynamoDB, a NoSQL database that allows for a flexible schema to house the customer analytics data sets, the basis for realtor.com’s recommendation engine. These data sets are created and updated by aggregating data from multiple upstream services, which are ingested into realtor.com’s analytics engine. There are tens of millions of nightly updates, which would take multiple hours to process if realtor.com® uploaded each item serially to DynamoDB using the PutItem API. Instead, realtor.com® created a system that segments the data set and takes advantage of the BatchWrite API, which allows us to concurrently upload 10-MB files across 25 concurrent data streams, accelerating realtor.com’s data ingestion from hours to minutes. This post shows how realtor.com® uploaded hundreds of GB data sets in parallel from Amazon S3 into DynamoDB using Amazon Athena and AWS Glue. This system increased the speed-to-market for realtor.com’s recommendation and personalization services API from hours to minutes. Solution overview At a high level, the solution includes the following steps: Data is gathered from upstream sources that produces a file. This file contains millions of records and is then stored on S3 via the output of a batch job every night. When the file lands in an S3 bucket, an Athena query is triggered by the object landing in the bucket to partition the large file into 25 smaller chunks with each line having 16 MB of data utilizing Athena’s Create Table As Function. Once the Athena queries are finished running, an AWS Glue job is initiated with multiple spark workers that uploads the data in parallel into DynamoDB. Once the process is complete, the files in the staging bucket are deleted. The following diagram illustrates this workflow: This solution uses AWS CLI, S3, Athena, AWS Glue, and DynamoDB. There is a cost associated with building out this pipeline. Preparing the data store The first step is to create the DynamoDB table to be the target destination for your data. Complete the following steps: From the DynamoDB console, choose Create Table. For Table name, enter target_table. For Primary key, enter pk. Select Use default settings and select Create Table. The following screenshot demonstrates steps 1–4. Choose Create table. Choose the table you created as seen in the screenshot below. Choose Capacity. Under Write capacity, for Minimum provisioned capacity, enter 5000 as shown in the screenshot below. Choose Save. This post uses DynamoDB’s automatic scaling feature to scale up the entries into the table. You must initially set this at 5000 minimum to provide ample throughput for the BatchWrite operations and mimic the writes that occur on the table as part of daily operation. This allows your table to scale to a maximum throughput that increases the amount of writes until all the items are in the DynamoDB table. Creating the data set To simulate the pipeline, this post uses a subset of the New York City Taxi and Limousine Commission (TLC) Trip Record Data. You need an S3 bucket in the same Region as your DynamoDB table. For more information, see Create a Bucket. The first step is to copy a file of the data set to your bucket. You must set up permissions and access controls that allow you to upload a file to your bucket. After you have a secret key and access key configured into your CLI, complete the following steps: Enter the following command into your terminal configured with the AWS Command Line Interface: aws s3 cp s3://nyc-tlc/trip data/yellow_tripdata_2009-01.csv s3:// After the data is in S3, open the Athena console. Choose Create a table. Choose From S3 bucket data. For Database, enter input_db. For Table Name, enter input_raw. For Location of Input Data Set, enter the location of your S3 bucket where you copied your data. The following screenshot demonstrates steps 5–7. Choose Next. On the data format screen, select Text File with Custom Delimiters as shown in the screenshot below. Choose Next. On the Columns screen, for Column Name, enter s3_data as shown below. Choose Next. Keep the defaults for your partition and choose Create Table. The query editor is updated with a query that looks similar to the following code: CREATE EXTERNAL TABLE IF NOT EXISTS input_db.input_raw ( s3_data string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = ' ', 'field.delim' = ' ', 'collection.delim' = '', 'mapkey.delim' = '' ) LOCATION 's3:///' TBLPROPERTIES ('has_encrypted_data'='false'); Choose Run Query. The resulting table pulls in two values for the Vendor Name and the Pickup Date over the 14 million rows in the data set. The raw data for this pipeline has now been created and the next step is to prepare this data to upload into DynamoDB. Because this data set doesn’t have a unique identifier, you must create a unique partition key from the data. To create the primary key, take your raw data from Athena and make an md5 hash and covert that into hex to grant a unique identifier for your rows. You can make sure that you don’t have any duplicates within your data by using the distinct operator. You do not need to apply this process to your data set if you have clean data with unique records (it is also not part of the realtor.com® pipeline). On the Athena console, navigate to the query editor and enter the following code: CREATE table input_for_ddb AS SELECT DISTINCT to_hex(md5(to_utf8(s3_data))) AS primary_key, replace(substring(trim(s3_data), 1, 20), '', '') AS attribute FROM input_raw WHERE length(trim(s3_data)) > 5 This creates a new table with the prepared data. The next step is to shard the data using ntile, which is a window function to distribute rows of an ordered partition into equal groups. This splits a data set into smaller chunks and maximizes your ability to upload into DynamoDB. Enter the following code: CREATE table dynamodb_shards AS SELECT primary_key, ntile(1000) OVER (order by primary_key) ntile_value FROM input_for_ddb The last step for preparing the data is to run a query that joins the data from the two previous tables you created and creates the final data set that is pushed to DynamoDB. CREATE table push_to_ddb_data AS SELECT dynamo_shards_table.ntile_value, ARRAY['primary_key','attribute' ] AS meta_array, array_agg(ARRAY[ coalesce(case WHEN length(trim(cast(input_for_ddb_table.primary_key AS varchar)))=0 THEN NULL ELSE cast(input_for_ddb_table.primary_key AS varchar) END ,'NULL'), coalesce(case WHEN length(trim(cast(attribute AS varchar)))=0 THEN NULL ELSE cast(attribute AS varchar) END ,'NULL') ]) AS data_array FROM input_for_ddb AS input_for_ddb_table JOIN dynamodb_shards dynamo_shards_table on(input_for_ddb_table.primary_key = dynamo_shards_table.primary_key) GROUP BY 1 Running the process After this query is finished, complete the following steps: On the AWS Glue console, under ETL, choose Jobs. Choose Add Job. In the AWS Glue Job configuration screen, name your job S3toDynamoDB. Choose Create IAM role. Choose Roles. Choose Create a Role. On the Create role screen, select Glue. Choose Next: Permissions. Choose Create a policy. A new window opens. Choose JSON. Enter the following policy (make sure you enter in your AWS account number): { "Version": "2012-10-17", "Statement": [ { "Sid": "GlueScriptPermissions", "Effect": "Allow", "Action": [ "athena:BatchGetQueryExecution", "athena:GetQueryExecution", "athena:GetQueryResults", "athena:GetQueryResultsStream", "athena:GetWorkGroup", "dynamodb:BatchWriteItem", "glue:GetTable", "s3:GetObject", "s3:ListBucket", "s3:PutObject" ], "Resource": [ "arn:aws:athena:*::workgroup/*", "arn:aws:dynamodb:*::table/target_table", "arn:aws:glue:*::catalog", "arn:aws:glue:*::database/input_db", "arn:aws:glue:*::table/input_db/push_to_ddb_data", "arn:aws:s3:::/*" ] }, { "Sid": "Logs", "Effect": "Allow", "Action": [ "athena:GetCatalogs", "logs:Create*", "logs:Put*" ], "Resource": "*" }, { "Sid": "Passrole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam:::role/AWSGlueServiceRole" } ] } Choose Next: Tags. Choose Next: Review. Name your policy AWSGlueServicePolicy. Choose Create. In the Role window, find your newly created policy. Choose Next: Tags. Choose Next: Review. Name your role AWSGlueServiceRole. Find this new role in the dropdown for your AWS Glue job. For Type, choose Spark. For Glue version, select Spark 2.2 Python 2. For This job runs, select A new script to be authored by you. Under Security configuration, script libraries, and job parameters, for Maximum capacity, enter 30. Leave everything else as default. Choose Next. On the next screen, choose Save job and edit script. On the next screen, enter the following code (make sure to change to the region you are operating in): #---- Glue PySpark script from __future__ import print_function import boto3 import time import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job #------------ Input section ----- args = getResolvedOptions(sys.argv, ['JOB_NAME']) DYNAMO_TABLE_NAME = 'target_table' athena_db_name = 'input_db' athena_table_name = 'push_to_ddb_data' def generate_data(meta_list, data_list): for data in data_list: yield dict(zip(meta_list, data)) def push2ddb_batch_put(meta_list, data_list): try: dynamodb = boto3.resource('dynamodb', region_name=) table = dynamodb.Table(DYNAMO_TABLE_NAME) with table.batch_writer() as batch: for data in generate_data(meta_list, data_list): ndata = {k: v for k, v in data.items() if v != "NULL"} batch.put_item( Item=ndata ) return len(data_list) except Exception as err: print("Error: while inserting data to DynamoDB...{err}".format(err=err)) raise err def insert_into_ddb(line): return push2ddb_batch_put(line['meta_array'], line['data_array']) if __name__ == "__main__": try: # ---- Create the Glue Context and using Glue Context create a data frame glueContext = GlueContext(SparkContext.getOrCreate()) src_data = glueContext.create_dynamic_frame.from_catalog(database=athena_db_name, table_name=athena_table_name) print("Count: ", src_data.count()) src_data.printSchema() df1 = src_data.toDF() df1.cache() df1.show(10) start_time = time.time() print(df1.rdd.map(insert_into_ddb).collect()) print(time.time() - start_time) except Exception as err: print("There is an error while inserting data to DDB...{err}".format(err=err)) raise err Choose Save. In this script, you are reading the prepared data that you previously created with Athena. The chunks that were created are loaded into DynamoDB using the parallel processing capability of AWS Glue Spark workers in addition to the DynamoDB BatchWriteItem API. Choose Run job. The AWS Glue job takes some time to provision but after it is running, you can see the logs from the AWS Glue cluster. Throughput should be maximized and scale up dynamically as time goes on until the AWS Glue job has finished. The metrics in the DynamoDB Metrics tab should look similar to the following screenshot. DynamoDB’s autoscaling feature began scaling based upon the number of writes the AWS Glue job was uploading into the DynamoDB table. At certain thresholds, the table increased the amount of write capacity to accommodate for the target of 70% utilization on the table. AWS Glue had multiple concurrent writers that used the retry logic of the BatchWrite call within the AWS SDK, which made it so that even if there were a throttled request, it would eventually get written to the DynamoDB table by successful completion of the job. The preceding screenshot shows that you uploaded 14 million items into DynamoDB in less than half an hour. In realtor.com’s case, this is a batch job that runs at one time during the day. With AWS Glue and DynamoDB, realtor.com® has a system that scales up dynamically with the amount of data that must be written to DynamoDB and scales down after the completion of the job without having to manage infrastructure. Conclusion This post demonstrated how to do the following: Set up a DynamoDB table to land your data into. Run multiple queries using Athena to prepare ingested raw data into a format that AWS Glue can parallelize. Set up an AWS Glue job that you can invoke on demand, either triggered by an event or on a schedule to parallelly upload into DynamoDB. realtor.com® built out the Athena and AWS Glue to DynamoDB pipeline to lower overall management while allowing the system to scale dynamically. This decreases the amount of time it takes to update realtor.com’s analytic profiles, which further help users find the house of their dreams. About the Author Arup Ray is the VP of Engineering at realtor.com and heads the Data Technology team. The data team at realtor.com has been using AWS technologies to make data actionable for home buyers and real estate professionals. Daniel Whitehead is a Solutions Architect with Amazon Web Services. https://probdm.com/site/ODgwNg

0 notes

aksharasoftwares-blog · 7 years ago

Text

Best Oracle SQL Training HSR at Bangalore

Akshara Software Technologies is providing the best python training in HSR layout, BTM Layout, and koramangala with most experienced professionals. Our trainer working in SQL and related technologies for more 11 years in MNC’s. We are offering SQL Classes in Bangalore in more practical way. We are offering SQL Classroom training Bangalore, SQL Online Training and SQL Corporate Training Bangalore. We framed our syllabus to match with the real world requirements for both beginner level to advanced level. SQL Classes in HSR conducting in week day ,week end both morning and evening batches based on participant’s requirement. We do offer Fast-Track SQL Training Bangalore and also One-to-One SQL Training in Bangalore.

Our participants will be eligible to clear all type of interviews at end of our sessions. Our SQL classes in HSR focused on assisting in placements as well. Our SQL Training Course Fees is very affordable compared to others.Our Training Includes SQL Real Time Classes Bnaglore , SQL Live Classes , SQL Real Time Scenarios

SQL Course Details:

Duration : 70-80 Hours (SQL+PL/SQL+Projects)

Demo and First 3 classes free

Real Time training with hands on Project

Assignment and Case Studies

Week Day & Week End Batches

Oracle SQL Training in Bangalore – SQL Syllabus in Detail: 35 Hours

Introduction to Oracle Database:

Session and Transaction

Categorize the different types of SQL statements

Describe the data set used by the course

Log on to the database using SQL * PLUS/Toad environment

Save queries to files

Retrieve Data using the SQL SELECT Statement:

List the capabilities of SQL SELECT statements

Generate a report of data from the output of a basic SELECT statement

Select All Columns

Select Specific Columns

Use Column Heading Defaults

Use Arithmetic Operators

Understand Operator Precedence

Learn the DESCRIBE command to display the table structure

Handling Null Values

Literals and Concatenation to generate reports

Suppress Duplicate Rows

Learn to Restrict and Sort Data:

Write queries that contain a WHERE clause to limit the output retrieved

List the comparison operators and logical operators that are used in a WHERE clause

Describe the rules of precedence for comparison and logical operators

Use character string literals in the WHERE clause

Write queries that contain an ORDER BY clause to sort the output of a SELECT statement

Sort output in descending and ascending order

Usage of Single-Row Functions to Customize Output

Describe the differences between single row and multiple row functions

Manipulate strings with character function in the SELECT and WHERE clauses

Manipulate numbers with the ROUND, TRUNC, and MOD functions

Perform arithmetic with date data

Manipulate dates with the DATE functions

Invoke Conversion Functions and Conditional Expressions

Describe implicit and explicit data type conversion

Use the TO_CHAR, TO_NUMBER, and TO_DATE conversion functions

Nest multiple functions

Apply the NVL, NULLIF, and COALESCE functions to data

Use conditional IF THEN ELSE logic in a SELECT statement

Aggregate Data Using the Group Functions

Use the aggregation functions to produce meaningful reports

Divide the retrieved data in groups by using the GROUP BY clause

Exclude groups of data by using the HAVING clause

Constraints

Different Types of constraints

Usage of Constraints

Creating relationship using Constraints.

Display Data From Multiple Tables Using Joins

Write SELECT statements to access data from more than one table

View data that generally does not meet a join condition by using outer joins

Join a table to itself by using a self join

Use Sub-queries to Solve Queries

Describe the types of problem that sub-queries can solve

Define sub-queries

List the types of sub-queries

Write single-row and multiple-row sub-queries

Multiple-Column Subqueries

Pairwise and Nonpairwise Comparison

Scalar Subquery Expressions

Solve problems with Correlated Subqueries

Update and Delete Rows Using Correlated Subqueries

The EXISTS and NOT EXISTS operators

The SET Operators

Describe the SET operators

Use a SET operator to combine multiple queries into a single query

Control the order of rows returned

Data Manipulation Statements

Describe each DML statement

Insert rows into a table

Change rows in a table by the UPDATE statement

Delete rows from a table with the DELETE statement

Save and discard changes with the COMMIT and ROLLBACK statements

OLAP Functions

RANK

DENSE_RANK

ROLLUP

CUBE

RATIO_TO_REPORT

LAG

LEAD

FIRST_VALUE

LAST_VALUE

Control User Access

Differentiate system privileges from object privileges

Create Users

Grant System Privileges

Create and Grant Privileges to a Role

Change Your Password

Grant Object Privileges

Revoke Object Privileges

Management of Schema Objects

Add, Modify, and Drop a Column

Add, Drop, and Defer a Constraint

How to enable and Disable a Constraint?

Create and Remove Indexes

Create a Function-Based Index

Create an External Table by Using ORACLE_LOADER and by Using ORACLE_DATAPUMP

Query External Tables

Manage Objects with Data Dictionary Views

Explain the data dictionary

Use the Dictionary Views

USER_OBJECTS and ALL_OBJECTS Views

Table and Column Information

Query the dictionary views for constraint information

Query the dictionary views for view, sequence, index and synonym information

Add a comment to a table

Query the dictionary views for comment information

Manipulate Large Data Sets

Use Subqueries to Manipulate Data

Retrieve Data Using a Subquery as Source

Insert Using a Subquery as a Target

Usage of the WITH CHECK OPTION Keyword on DML Statements

List the types of Multitable INSERT Statements

Use Multitable INSERT Statements

Merge rows in a table

Regular Expression Support

Use the Regular Expressions Functions and Conditions in SQL

Use Meta Characters with Regular Expressions

Perform a Basic Search using the REGEXP_LIKE function

Find patterns using the REGEXP_INSTR function

Extract Substrings using the REGEXP_SUBSTR function

Replace Patterns Using the REGEXP_REPLACE function

Implement the REGEXP_COUNT function

Partitions

Types Of partitions

Usage of partitions

Hierarchical Queries

Other Schema Objects

Create a simple and complex view

Retrieve data from views

Create and maintain indexes

Create private and public synonyms

MORE VISITS:

0 notes

topicprinter · 8 years ago

Link

Personally, I believe Wordpress is far more powerful and valuable than some site operators and developers give it credit for. Sure, there are Ruby and Python and myriad ways to build a site -- but Wordpress sits eloquently at the confluence of simple to operate, actually possible to create a visual design that is above the bar of “good enough,” and most importantly, easy for a non-technical individual to operate, add content to and learn from.Of course -- there are limits. Wordpress isn’t going to work if you’re just an idiot about making content look good, making content the same across your site and following some mental sense of design guidelines and taking the time to learn how clicking on this tool here -- does this -- results in this margin change -- impacts this layout -- looks like this on mobile -- etc.My basic guide for getting a site from nothing to quality in Wordpress goes like this:Install Wordpress on your site host (through your hosting provider or other -- if you have a current site and need to do this on a temp domain, you’ll need some extra steps). If you don’t know how to do this, your hosting provider should have clear instructions -- search on their site or Google ‘installing Wordpress [x host]’. Godaddy, WP-Engine and others make this fairly easy.Go to Themeforest and view the most popular items filtered to Wordpress. I’ll save you some time -- it’s right here:http://themeforest.net/popular_item/by_category?category=wordpressStart opening and viewing the themes here. Once you have a theme open, look at the various demo installs for that theme. Each theme will usually have between 10 and 100. Try to find one you like that you can see covering the look-and-feel of about 80-85% of your site. Personally, I wouldn’t recommend choosing a theme that is lower than the Top 12 in popularity. The top themes are the top for a reason -- high sales, active support and, most importantly for the non-technical, usually a built-in page-builder plugin. This is clutch -- a requirement, really.You should look for that plugin listed in the theme features -- it may be called Visual Composer, Cornerstone or a few others will generally be fine. I’m going to recommend Cornerstone over others currently (attached to ‘X theme’), because I’ve found it consistently easier to use for the non-technical.Buy a theme. I’ve built sites on X, BeTheme, Jupiter, Salient -- they all work fine. Buy it and then download the zip.Install the theme on your Wordpress installation in the themes section -- you’ll see an option to ‘add new’ theme -- either on your temp domain or live domain. You may have an issue uploading the .zip. You’re probably trying to install the wrong .zip. Unzip it and look, is it filled with another .zip and readme files? You want the one that is only the theme. This can be a jerk sometimes -- but keep playing at it, eventually, you’ll get a successful installation. If it keeps flopping and you’re feeling a little technically inept, grab someone on Fiverr that can install a theme and handle it that way.ASIDE: Grabbing someone on Fiverr. A few spots through here you’ll see me mention grabbing someone on Fiverr (Fiverr.com). Fiverr is a marketplace where you can buy one-off tasks. Things like installing a theme file, installing a tricky plug-in, moving from one host to another -- these are great Fiverr jobs. Make sure to set up temp passwords on your accounts or separate log-ins for anyone you allow access via Fiverr -- and when you’re done, actually remove them.Once your theme is installed you’re going to be looking for the option to install demo content, as well as the required plug-ins. Each theme will have a large cache of demo content for you to install. The ability to do so might be under ‘Theme Options’ -- sometimes ‘Customization’ -- in your menu, but once you install this, your site is going to be filled with junk -- good looking junk, but still junk. If you’re better at this, you can sometimes skip this because you can layout from scratch, but if you’re newer, trust the demo content.Required plugins are the same. Your theme will require you to install a few plugins and they should be listed in the admin in a section specifically for that. Install them all, especially any that are page builders -- until you do, your layout and pages will be broken.Getting to this point is about getting SOMETHING up. It won’t be pretty -- some stuff may still be broken and you may have missing images. Some tricky items may have come up depending on your theme and there can always be some headaches. Google is your best friend. Google clearly: “wordpress [x theme] installation problem [error text] 2017”. (Wherein ‘x theme’ is the name of your theme plus them -- hell, it may actually be ‘x theme’ -- as that’s a fairly good theme.)Now that this is done, you have a site with barrels of content you don’t need. It’s annoying to need to delete it all, but it’s better for you to start with something if you’re not a designer and work your way down to what you need than try to come up with how to do a layout consistently from scratch. Start going through your content and start to get a feel for the pages that exist that most closely resemble the pages you need. Just at a guess, you might want some pages like:Home PageAbout UsThe TeamPricingProduct DetailsContact UsFAQMost of your themes are going to have templated pages that cover a big chunk of the above, and you’re just going to be swapping content to make them make sense. Your big goal with these pages is ‘how can I get 90% of what I need with what’s here, not how can I get the 100% perfect site with stuff that doesn’t look anything like what I’ve got?’.Now, some tips on editing the site content using a page builder to get this from being a demo site to being your site. First, find the page you’re going to begin editing (and to be clear, your site is built off of pages -- not posts -- if you’ve never done this before at all). The fastest thing might be to load pages like a regular user and then using the Wordpress nav bar at the top of the screen edit that page. You’re going to want to go with the option that is editing the page using ‘x page-builder’. Some you’ll choose after you load the edit page, some you’ll choose as a link in the header.Once you’ve done this, you’ll be given a layout that is either your exact page or something that shows your content in a bunch of blocks like your page. You can start editing your site content in these blocks usually in a few ways. The content of the block, the style of the content. The style of the block. The style of the row or section the block sits in and the style of the page overall. Think of it like nesting dolls. Using each of these, you can dramatically change the content in a section or module of your site.The best tip I can give, though, is -- before you change anything in any field, have some scrap paper and write down what it was before. Make a small change -- just one or two things, then save, then view, then see if you need to go back to what you’ve written. Working like this will allow you to see your edits before you’ve changed so much you can’t go back. Do this especially if you start changing the pixels between items in things like margins and padding.The second best tip is -- steal from what works. Don’t make new sections if you don’t have to. Duplicate other ones that are close to what you like. Find a page you might not be using from the demo content but where you like one piece and open the editor -- find the section for the content you like and then create a new section, copying every single setting and value from the content that already looks right.Either before or after you’re editing the content, you can also usually edit the overall theme of your site. This should sit somewhere in the Wordpress admin menu. It may be its own section under your theme name, or it may be under the section ‘appearance’ and then customization. Using this, you’ll be able to edit the common aspects of your site for the most part. Logo, footer and header content and behavior, fonts, standard page settings and more. Just changing some of these items can take a site from being an exact copy of demo content to looking fairly unique. Same rules apply as above. Don’t change everything at once. Make small changes, view the draft or save and view and then have copied down what you’ve changed in case you need to revert.Honestly guys, there is so much more. I want to go in to detail about getting menus to work right and plugins and optimizing loading -- but if you’re careful about the above you can get a site that looks as good as 90% of startup sites out there, for the most part.

#/r/startups #reddit

0 notes

hydrus · 7 years ago

Text

Version 324

youtube

windows

zip

exe

os x

app

tar.gz

linux

tar.gz

source

tar.gz

I had a great week. The downloader overhaul is almost done.

pixiv

Just as Pixiv recently moved their art pages to a new phone-friendly, dynamically drawn format, they are now moving their regular artist gallery results to the same system. If your username isn't switched over yet, it likely will be in the coming week.

The change breaks our old html parser, so I have written a new downloader and json api parser. The way their internal api works is unusual and over-complicated, so I had to write a couple of small new tools to get it to work. However, it does seem to work again.

All of your subscriptions and downloaders will try to switch over to the new downloader automatically, but some might not handle it quite right, in which case you will have to go into edit subscriptions and update their gallery manually. You'll get a popup on updating to remind you of this, and if any don't line up right automatically, the subs will notify you when they next run. The api gives all content--illustrations, manga, ugoira, everything--so there unfortunately isn't a simple way to refine to just one content type as we previously could. But it does neatly deliver everything in just one request, so artist searching is now incredibly faster.

Let me know if pixiv gives any more trouble. Now we can parse their json, we might be able to reintroduce the arbitrary tag search, which broke some time ago due to the same move to javascript galleries.

twitter

In a similar theme, given our fully developed parser and pipeline, I have now wangled a twitter username search! It should be added to your downloader list on update. It is a bit hacky and may be ultimately fragile if they change something their end, but it otherwise works great. It discounts retweets and fetches 19/20 tweets per gallery 'page' fetch. You should be able to set up subscriptions and everything, although I generally recommend you go at it slowly until we know this new parser works well. BTW: I think twitter only 'browses' 3200 tweets in the past, anyway. Note that tweets with no images will be 'ignored', so any typical twitter search will end up with a lot of 'Ig' results--this is normal. Also, if the account ever retweets more than 20 times in a row, the search will stop there, due to how the clientside pipeline works (it'll think that page is empty).

Again, let me know how this works for you. This is some fun new stuff for hydrus, and I am interested to see where it does well and badly.

misc

In order to be less annoying, the 'do you want to run idle jobs?' on shutdown dialog will now only ask at most once per day! You can edit the time unit under options->maintenance and processing.

Under options->connection, you can now change max total network jobs globally and per domain. The defaults are 15 and 3. I don't recommend you increase them unless you know what you are doing, but if you want a slower/more cautious client, please do set them lower.

The new advanced downloader ui has a bunch of quality of life improvements, mostly related to the handling of example parseable data.

full list

downloaders:

after adding some small new parser tools, wrote a new pixiv downloader that should work with their new dynamic gallery's api. it fetches all an artist's work in one page. some existing pixiv download components will be renamed and detached from your existing subs and downloaders. your existing subs may switch over to the correct pixiv downloader automatically, or you may need to manually set them (you'll get a popup to remind you).

wrote a twitter username lookup downloader. it should skip retweets. it is a bit hacky, so it may collapse if they change something small with their internal javascript api. it fetches 19-20 tweets per 'page', so if the account has 20 rts in a row, it'll likely stop searching there. also, afaik, twitter browsing only works back 3200 tweets or so. I recommend proceeding slowly.

added a simple gelbooru 0.1.11 file page parser to the defaults. it won't link to anything by default, but it is there if you want to put together some booru.org stuff

you can now set your default/favourite download source under options->downloading

misc:

the 'do idle work on shutdown' system will now only ask/run once per x time units (including if you say no to the ask dialog). x is one day by default, but can be set in 'maintenance and processing'

added 'max jobs' and 'max jobs per domain' to options->connection. defaults remain 15 and 3

the colour selection buttons across the program now have a right-click menu to import/export #FF0000 hex codes from/to the clipboard

tag namespace colours and namespace rendering options are moved from 'colours' and 'tags' options pages to 'tag summaries', which is renamed to 'tag presentation'

the Lain import dropper now supports pngs with single gugs, url classes, or parsers--not just fully packaged downloaders

fixed an issue where trying to remove a selection of files from the duplicate system (through the advanced duplicates menu) would only apply to the first pair of files

improved some error reporting related to too-long filenames on import

improved error handling for the folder-scanning stage in import folders--now, when it runs into an error, it will preserve its details better, notify the user better, and safely auto-pause the import folder

png export auto-filenames will now be sanitized of \, /, :, *-type OS-path-invalid characters as appropriate as the dialog loads

the 'loading subs' popup message should appear more reliably (after 1s delay) if the first subs are big and loading slow

fixed the 'fullscreen switch' hover window button for the duplicate filter

deleted some old hydrus session management code and db table

some other things that I lost track of. I think it was mostly some little dialog fixes :/

advanced downloader stuff:

the test panel on pageparser edit panels now has a 'post pre-parsing conversion' notebook page that shows the given example data after the pre-parsing conversion has occurred, including error information if it failed. it has a summary size/guessed type description and copy and refresh buttons.

the 'raw data' copy/fetch/paste buttons and description are moved down to the raw data page

the pageparser now passes up this post-conversion example data to sub-objects, so they now start with the correctly converted example data

the subsidiarypageparser edit panel now also has a notebook page, also with brief description and copy/refresh buttons, that summarises the raw separated data

the subsidiary page parser now passes up the first post to its sub-objects, so they now start with a single post's example data

content parsers can now sort the strings their formulae get back. you can sort strict lexicographic or the new human-friendly sort that does numbers properly, and of course you can go ascending or descending--if you can get the ids of what you want but they are in the wrong order, you can now easily fix it!

some json dict parsing code now iterates through dict keys lexicographically ascending by default. unfortunately, due to how the python json parser I use works, there isn't a way to process dict items in the original order

the json parsing formula now uses a string match when searching for dictionary keys, so you can now match multiple keys here (as in the pixiv illusts|manga fix). existing dictionary key look-ups will be converted to 'fixed' string matches

the json parsing formula can now get the content type 'dictionary keys', which will fetch all the text keys in the dictionary/Object, if the api designer happens to have put useful data in there, wew

formulae now remove newlines from their parsed texts before they are sent to the StringMatch! so, if you are grabbing some multi-line html and want to test for 'Posted: ' somewhere in that mess, it is now easy.

next week

After slaughtering my downloader overhaul megajob of redundant and completed issues (bringing my total todo from 1568 down to 1471!), I only have 15 jobs left to go. It is mostly some quality of life stuff and refreshing some out of date help. I should be able to clear most of them out next week, and the last few can be folded into normal work.

So I am now planning the login manager. After talking with several users over the past few weeks, I think it will be fundamentally very simple, supporting any basic user/pass web form, and will relegate complicated situations to some kind of improved browser cookies.txt import workflow. I suspect it will take 3-4 weeks to hash out, and then I will be taking four weeks to update to python 3, and then I am a free agent again. So, absent any big problems, please expect the 'next big thing to work on poll' to go up around the end of October, and for me to get going on that next big thing at the end of November. I don't want to finalise what goes on the poll yet, but I'll open up a full discussion as the login manager finishes.

#release

1 note · View note

margdarsanme · 5 years ago

Text

NCERT Class 12 Computer Science Chapter 4 Database Concepts

NCERT Class 12 Computer Science Python Solutions for Chapter 4 :: Database Concepts

Short Answer Type Questions-I

Answer:Cartesian ProductDegree — 4Cardinality = 6

Question 3:Observe the following table and answer the parts (i) and (ii):

In the above table, can we have Qty as primary key.

What is the cardinality and degree of the above table?

Answer:

We cannot use Qty as primary key because there is a duplication of values and primary key value cannot be duplicate.

Degree =4Cardinality = 5

Answer:Join operation or MEMBER U ACTIVITYDegree of Report = No of columns(No of Attributes) = 3Candinality Report = No of Rows(No of tuples) = 6

Question 7:Observe the table ‘Club’ given below:

What is the cardinality and degree of the given table?

If a new column Contact_No has been added and three more members have joined the club then

Answer:

Cardinality = 4 Degree = 5

Cardinality = 7

Degree = 6

Cartesian Product: The cartesian product is an operator which works on two sets. It combines the tuples of one relation with all the tuples of the other relation.

Example: Cartesian Product

So, (Bank Account Number, Aadhaar Number) are candidate keys for the table.Aadhaar Number — Primary keyBank Account Number — Alternate key

Question 11:What do you understand by degree & cardinality of a Table ?Answer:Degree refers to the number of columns in a table.Cardinality refers to the number of rows.

Question 12:Observe the following table and answer the part (i) and (ii) accordingly.

In the above table, can we take Mno as Primary key ? (Answer as [Yes/No] only.) Justify your answer with a valid reason.

What is the degree and the cardinality of the above table?

Answer:

Degree = 4Cardinality = 5[Hint: Because Pencil and Eraser are having the same Mno = 2. Primary key needs to be unique]

a and b are attribute names

i is a binary operation in the set

v is a value constant

R is a relation

The selection (R) selects all those tuples in R for which i holds between the a atribute and the b attribute.

Example: Selection and Projection

Question 18:Expand the following:

SQL

DBMS

Answer:

SQL – Structured Query Language.

DBMS – Data Base Management System.

Question 24:In the following 2 tables, find the union value of Student 1 and Student 2.Answer:

via Blogger https://ift.tt/2RrnZJs

#Blogger

0 notes

margdarsanme · 5 years ago

Text

NCERT Class 12 Computer Science Chapter 4 Database Concepts

NCERT Class 12 Computer Science Python Solutions for Chapter 4 :: Database Concepts

Short Answer Type Questions-I

Answer:Cartesian ProductDegree — 4Cardinality = 6

Question 3:Observe the following table and answer the parts (i) and (ii):

In the above table, can we have Qty as primary key.

What is the cardinality and degree of the above table?

Answer:

We cannot use Qty as primary key because there is a duplication of values and primary key value cannot be duplicate.

Degree =4Cardinality = 5

Answer:Join operation or MEMBER U ACTIVITYDegree of Report = No of columns(No of Attributes) = 3Candinality Report = No of Rows(No of tuples) = 6

Question 7:Observe the table ‘Club’ given below:

What is the cardinality and degree of the given table?

If a new column Contact_No has been added and three more members have joined the club then

Answer:

Cardinality = 4 Degree = 5

Cardinality = 7

Degree = 6

Cartesian Product: The cartesian product is an operator which works on two sets. It combines the tuples of one relation with all the tuples of the other relation.

Example: Cartesian Product

So, (Bank Account Number, Aadhaar Number) are candidate keys for the table.Aadhaar Number — Primary keyBank Account Number — Alternate key

Question 11:What do you understand by degree & cardinality of a Table ?Answer:Degree refers to the number of columns in a table.Cardinality refers to the number of rows.

Question 12:Observe the following table and answer the part (i) and (ii) accordingly.

In the above table, can we take Mno as Primary key ? (Answer as [Yes/No] only.) Justify your answer with a valid reason.

What is the degree and the cardinality of the above table?

Answer:

Degree = 4Cardinality = 5[Hint: Because Pencil and Eraser are having the same Mno = 2. Primary key needs to be unique]

a and b are attribute names

i is a binary operation in the set

v is a value constant

R is a relation

The selection (R) selects all those tuples in R for which i holds between the a atribute and the b attribute.

Example: Selection and Projection

Question 18:Expand the following:

SQL

DBMS

Answer:

SQL – Structured Query Language.

DBMS – Data Base Management System.

Question 24:In the following 2 tables, find the union value of Student 1 and Student 2.Answer:

from Blogger http://www.margdarsan.com/2020/09/ncert-class-12-computer-science-chapter_81.html

0 notes

hydrus · 8 years ago

Text

Version 265

youtube

windows

zip

exe

os x

app

tar.gz

linux

tar.gz

source

tar.gz

I had a great week. Imports are far less laggy and I've moved some longer term gui stuff forward.

PROTIP for users running from source: The client now uses the python package 'matplotlib' to draw some charts. It isn't needed to boot, but you may like to add it when convenient.

faster imports

I have shuffled around how new files are inspected and processed in the client. Imports' heavy CPU work (like thumbnail generation and accurate video frame counting) now occurs in a separate space that causes less gui lag. You should notice all imports are less heavy on your overall browsing experience, but particularly so for webm threads and similar 'video' gallery queries.

I am really pleased with this change, which on my machines seems to have basically removed all noticeable import lag, but please let me know if you discover particular files or general import situations that still cause problems.

more bandwidth gui

I did not have time to convert any more downloaders to the new networking engine, but I did flesh out the existing bandwidth management gui. services->review bandwidth usage, now lets you review and edit bandwidth rules for specific network contexts and the default contexts that are otherwise used. I've also added a prototype bar chart to display monthly historical usage for specific contexts.

I have attempted to make the default bandwidth rules fairly simple and neither too aggressive nor lax. They mostly just stop you downloading too much garbage by mistake or being too rude to the servers you download from. If you are just a regular user with regular consumption habits, you can safely leave them alone. The feedback on this stuff in action will improve in the coming weeks.

I have thought about these rules a lot. If/when you are comfortable with my network context system, let me know what you think.

There is a bit more to do here and possibly some bugs to fix. The system can get complicated, so I may need some more help or to hide some features behind the advanced mode, so feedback would be appreciated. I also had problems with this new dialog crashing some clients. I think I have it fixed now, but please let me know if you get it yourself.

database migration

The database->database migration dialog is also coming along. It now has some working buttons to move your files around and some unfinished help. I expect to have it ready for regular users in the next week or two, but if you are experienced, please feel free to check it out.

The old options->client files locations panel is now gone, as is the database->rebalance job. From now on, everything will be handled in this new dialog.

full list

the bandwidth engine now recognises individual thread watcher threads as a network context that can inherit default bandwidth rules

tweaked default bandwidth rules and reset existing rules to this new default

review all bandwidth frame now has a time delta button to choose how the network contexts are filtered

review all bandwidth frame now updates itself every 20 seconds or so

review all bandwidth frame now has a 'delete history' button

review all bandwidth frame now shows if services have specific rules

review all bandwidth frame now has an 'edit default rules' button that lets you select and set rules for default network contexts

review network context bandwidth frame now has a bar chart to show historical usage!

bar chart is optional based on matplotlib availability

review network context bandwidth frame now lists current bandwidth rules and current usage. it says whether these are default or specific rules

review network context bandwidth frame now has a button to edit/clear specific rules

rows of bandwidth rules and current usage, where presented in ui, are now ordered in ascending time delta

misc bandwidth code improvements

client file imports are now bundled into their own job object that generates cpu-expensive file metadata outside of the main file and database locks. file imports are now much less laggy and should generally block the feel of the ui much less

removed the database 'rebalance files' menu entry

removed the 'client files location' page from options

db client_files rebalance will no longer occur in idle or shutdown time

(this stuff is now handled in the migrate database dialog)

'migrate database' now uses a dialog, meaning you cannot interact with the rest of the program while it is open

migrate database now has file location editing verbs--add, remove, +/- weight, rebalance_now. thumbnail location and portable db migration will be added next week

flushed out the backup guide in the getting started help, including to reflect the new internal process

the client now saves the 'last session' gui session before running a database backup

the shutdown maintenance yes/no dialog will now auto-no after 15 seconds

gave status bar tabs a bit more space for their text (some window managers were cutting them off)

tumblr api lookups are now https

tumblr files uploaded pre-2013 will no longer receive the 68. subdomain stripping, as they are not supported at the media.tumblr.com domain (much like 'raw' urls)

pages will now not 'start' their download queues or thread checkers or whatever data checking loops they have until their initial media results are loaded

key events started from an autocomplete entry but consumed by a higher window (typically F5 or F9/ctrl+t for refresh or new page at the main gui level) will no longer be duplicated

fixed a shutdown issue with network job controls that could break a clean shutdown in some circumstances

if the user attempts to create more than 128 pages, the client will now instead complain with a popup message. Due to OS-based gui handle limits, more than this many pages increasingly risks a crash

if the client has more than 128 pages both open and waiting in the undo menu, it will destroy the 'closed' ones

next week

Moving boorus and some other 'gallery' downloaders to the new network engine is the top priority, but I would like to finish the database migration stuff by adding external thumbnail location options and improving the gui feedback as it moves files around.

#release

0 notes